Telegram Group & Telegram Channel
مقایسه زمانی BPE Tokenizer روی دو کتابخونه Hugging Face Tokenizers و OpenAI TikToken روی ولیدیشن دیتاست تاینی‌استوریز:

dataset = load_dataset("roneneldan/TinyStories")
texts = dataset["validation"]["text"]

# Load the GPT-2 tokenizer for both libraries
tiktokenizer = tiktoken.get_encoding("gpt2") # tiktoken
hf_tokenizer = Tokenizer.from_pretrained("gpt2") # Hugging Face tokenizers

# Measure tiktoken speed
start_time = time.time()
tiktoken_results = [tiktokenizer.encode(text) for text in texts]
tiktoken_time = time.time() - start_time

# Measure tokenizers speed
start_time = time.time()
hf_results = [hf_tokenizer.encode(text).ids for text in texts]
hf_time = time.time() - start_time

# Print results
print(f"tiktoken Time: {tiktoken_time:.4f} seconds")
print(f"tokenizers Time: {hf_time:.4f} seconds")

tiktoken Time: 2.6481 seconds
tokenizers Time: 16.7744 seconds



tg-me.com/pytorch_howsam/671
Create:
Last Update:

مقایسه زمانی BPE Tokenizer روی دو کتابخونه Hugging Face Tokenizers و OpenAI TikToken روی ولیدیشن دیتاست تاینی‌استوریز:

dataset = load_dataset("roneneldan/TinyStories")
texts = dataset["validation"]["text"]

# Load the GPT-2 tokenizer for both libraries
tiktokenizer = tiktoken.get_encoding("gpt2") # tiktoken
hf_tokenizer = Tokenizer.from_pretrained("gpt2") # Hugging Face tokenizers

# Measure tiktoken speed
start_time = time.time()
tiktoken_results = [tiktokenizer.encode(text) for text in texts]
tiktoken_time = time.time() - start_time

# Measure tokenizers speed
start_time = time.time()
hf_results = [hf_tokenizer.encode(text).ids for text in texts]
hf_time = time.time() - start_time

# Print results
print(f"tiktoken Time: {tiktoken_time:.4f} seconds")
print(f"tokenizers Time: {hf_time:.4f} seconds")

tiktoken Time: 2.6481 seconds
tokenizers Time: 16.7744 seconds

BY PyTorch Howsam


Warning: Undefined variable $i in /var/www/tg-me/post.php on line 283

Share with your friend now:
tg-me.com/pytorch_howsam/671

View MORE
Open in Telegram


PyTorch Howsam Telegram | DID YOU KNOW?

Date: |

The STAR Market, as is implied by the name, is heavily geared toward smaller innovative tech companies, in particular those engaged in strategically important fields, such as biopharmaceuticals, 5G technology, semiconductors, and new energy. The STAR Market currently has 340 listed securities. The STAR Market is seen as important for China’s high-tech and emerging industries, providing a space for smaller companies to raise capital in China. This is especially significant for technology companies that may be viewed with suspicion on overseas stock exchanges.

Unlimited members in Telegram group now

Telegram has made it easier for its users to communicate, as it has introduced a feature that allows more than 200,000 users in a group chat. However, if the users in a group chat move past 200,000, it changes into "Broadcast Group", but the feature comes with a restriction. Groups with close to 200k members can be converted to a Broadcast Group that allows unlimited members. Only admins can post in Broadcast Groups, but everyone can read along and participate in group Voice Chats," Telegram added.

PyTorch Howsam from cn


Telegram PyTorch Howsam
FROM USA